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Recovery of three-dimensional structure from single particle X-ray scattering of completely ran¬ 
domly oriented diffraction patterns as predicted few decades back has been real due to the advent 
of the new emerging X-ray Free Electron Laser (XFEL) technology. As the world’s first XFEL is in 
operation starting from June 2009 at SLAC National Lab at Stanford, the very first few experiments 
being conducted on larger objects such as viruses. Many of the important structures of nature such 
as helical viruses or deoxyribonucleic acids (DNA) consist of helical repetition of biological subunits. 

Hence development of method for reconstructing helical structure from collected XFEL data has 
been a top priority research. In this work we have developed a method for solving helical structure 
such as TMV (tobacco mosaic virus) from a set of randomly oriented simulated diffraction patterns 
exploiting symmetry and Fourier space constraint of the diffraction volume. 


I. INTRODUCTION 

In 1977 KarrP pointed out that correlated fluctuations 
in intensity in x-ray scattering of non-oriented identical 
particles contain useful structural information regarding 
the particle itself. In fluctuation scattering experiment, 
radiation must be recorded on time scale shorter than 
the rotational diffusion timd^. Since the XFELs are 
in operation in USA and elsewhere, the development 
of useful technique and algorithm for structure deter¬ 
mination from collected diffraction patterns of random 
orientations from diffract and destroy experiment is 
very important not only from technological point of 
view but also for finding clues for diseases and designing 
drug to treat them. So far structure determination 
of helical biological structure such as TMV 3 or DNA® 
has been primarily done by fiber diffraction experiment 
where the helical structures are aligned along their 
body long axis which is tedious because of the entropic 
tendency of the molecule. In this work we are reporting 
a full 3D recovery of TMV up to three repeating 
unit from a set of simulated diffraction patterns of 
randomly oriented TMV helices (atomic coordinate of 
biological assembly of TMV deposited in protein data 
bank as 2tmv ) in XFEL diffract and destroy experiment. 


II. BACKGROUND 


object may be reconstructed provided the above expan¬ 
sion coefficients can be recovered from the collected set 
of two dimensional (2D) diffraction patterns of random 
orientations. 

Each diffraction pattern of random orientation repre¬ 
sents a section through the three dimensional (3D) recip¬ 
rocal space of the molecule. The intensities on the curved 
section of the Ewald sphere 5 may be labelled by magni¬ 
tude q of the scattering vector and an azimuthal angle 
ip. The angular cross correlation function between inten¬ 
sities of two different resolution rings q and q' averaged 
over a set of diffraction patterns defined as 
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Where I S j(q,(p n ) is the intensity of a pixel on the sj -th 
diffraction pattern and N s j is the number of diffraction 
patterns of random orientations consisting the primary 
collected data set. Note that the orientational averaging 
of the diffraction patterns in Eq.Q is a reasonable as¬ 
sumption since for a large number of diffraction pattern 
all orientations are equally likely® in SO(3) space suggest¬ 
ing that the left hand side of Eq. © will be independent 
of the value ip chosen on the right hand side. 

Intensity distribution on the si diffraction pattern may 
be expressed as 


Scattered intensity distribution / over 3D molecular 
reciprocal space in spherical coordinate may be expressed 
in spherical harmonic expansion as shown by Saldin et. 

aim 


I(sl){q,<p)= X D LMM' I LM(q)yLM'(0(q),‘fi) (3) 
LMM' 

Similarly, intensity distribution on the s2 diffraction 
pattern may be expressed as 


I(<h 0,P) = lLM(q)YLM(0, <p) (1) 

LM 

where Ylm{0,p) is a spherical harmonic (L and M are 
usual angular momentum quantum numbers). In princi¬ 
ple a full three dimensional (3D) structure of a biological 


WsV)= X D¥l, M ,,J L ,M''(q')Y L ,M'''{e\q'),y') 

L'M" M'" 

(4) 

Substitution of Eq. © and Eq. © into Eq. © and the 
orthogonal property of the Wigner D matrices in SO(3) 
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C 2 (q,q',Aip) ='^2B L (q,q')^-P L [ cos 7 ] (7) 

where 

cos 7 = cos 0 (g) cos O'(q') + sin 0 (g) sin O'(q') cos(Ag?) 

(8) 

and 

B L ^q l ) = Y J lLM{q)Il M W) ( 9 ) 

M 


FIG. 1: Schematic of the diffraction geometry (k is the magni¬ 
tude of the incident wave vector, © is scattering angle) shows 
the relationship between the polar angle 0(g) and the scatter¬ 
ing angle 0(g) as a function of the magnitude of the scattering 
vector q. Note that for flat Ewald sphere 0 = 7r/2. 


(Eq.(5)) allows the addition of two spherical harmonics 
(Eq.( 6 )) in Eq.Q. 


A. Matrix Inversion 

q') can be obtained from C^g, g', A ip) Eq.([7|) us¬ 
ing a matrix inversion technique based on singular value 
decomposition (SVD) which is purely a quantity obtained 
from diffraction patterns. The singular value decomposi¬ 
tion of a matrix A(m x n) where m < n is the factoriza¬ 
tion of A into the product of three matrices 
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or I 1 

Y j yiM{0{q),p)Y LM {.e'{q')^') = ~^-P L [ cos 7 ] (6) 


A = UT,V t ( 10 ) 

where U(mx to) and V(nxn) are orthonormal matrices 
Eq. Eq.([T 2 |) and E(ra x n) is a diagonal matrix hav¬ 
ing only nonnegative diagonal entries in ascending order 
such that (cri < cr 2 <.< cr m < 0). 

UU T = I (11) 
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FIG. 2: Cross-sectional view of a single unit of TMV shows 
the 49-unit protein subunit protrudes out along the helix. The 
axes are in real-space coordinates in units of A. 

Above substitution finally lead to the following simpli¬ 
fication of the intensity correlation function^ as shown in 
Eq.0 


VV T = I 


( 12 ) 


The zero padding of E(m x n) matrix can 
to compute the [E'(ra x n )] _1 = E"(n x m) 
finally obtain the A -1 matrix by Eq. (13) 


be carved 
matrix to 


A- 1 = ( V t )- 1 Yi"U~ 1 (13) 

Note that in correlation method a large data set consist¬ 
ing of huge (ideally infinite) number of diffraction pat¬ 
terns has been reduced into a compact data set consisting 
of quadratic functions of Fourier shell coefficients. 

However recovering Ilm(q) coefficients from the known 
quadratic shell correlation term H^(g, g') is a formidable 
mathematical and computational challenge and develop¬ 
ment of such a method in principle would allow a general 
approach for solving structure using correlation method. 
While development of such a method is yet to be ac¬ 
complished, biological structure having certain symme¬ 
try such as helical symmetr^or icosahedral symmetry 8 
may be solved using method based on primarily symme¬ 
try assistance technique 9 . 
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TMV consists of repeating unit of biological building 
block 10 and each unit consists of 49 protein sub-unit 
spanned along the three-turn (cross-sectional view of sin¬ 
gle repeating unit shown in Fig. 0 ) provides the clue 
that the M values in Eq.([9| be 0, ±49, ±98, etc. 

Since M < L, if we limit Lmax = 48, the only per¬ 
mitted value of M be zero. Hence Eq. 0 for the case of 
TMV may be modified as 

I( q ,e, ( p) = '£l LO (q)Y LO (0,<p) (14) 

L 

i.e.; up to L max = 48, the TMV diffraction volume may 
be reconstructed from M = 0 component alone. Since 
the radius of TMV is about ^ 100 A and each c re¬ 

peat unit 11 of TMV is 69 A, the conventional wisdom 
for angular momentum definitiorP, g maa , x R = Lmax ; 
permits a reconstruction of TMV repeating unit up to a 
Qmax — 0-3 A 
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FIG. 3: Diffraction patterns for various random orientations 
of TMV. TMV orientations for which the X-ray incident beam 
is perpendicular to the body long axis (upper left), ~ ±45° 
to the body long axis (upper right, lower left) shows the layer 
line features are separated by 2i t/c where c is the length of 
repeat unit of TMV which is 69 A. Orientations for which 
the body long axis is somewhat parallel to the incident X-ray 
beam does not show 2i r/c splits of layer line features (lower 
right). 


Eq. 0 holds the primary relation among various 
Fourier shells (including the diagonal term as well as 
the cross term) in quadratic form in the reciprocal space 
of the molecule. To recover lLo(q,q) coefficients from 
B L (q,q), a triple correlation function®^! Eq.( 16) and the 
associated three-point angular correlation has been intro¬ 
duced Eq.(p±l) 


Tl(<7) = ^2 G(L 1 0,L 2 0-,L0)I Ll o(q)I L2 o(q)I LO (q) (15) 

I/lZ/2 

where G is a Gaunt Coefficient. 


The ring triple correlation is related to experimental 
three point angular correlation C 3 as 

C 3 (g, Ay>) = J T L (q)P L [cos(Aip)] (16) 

To assist further why the M=0 approximation is a valid 
assumption for the reconstruction of helical bio-structure 
whose Fourier transform in three dimensional (3D) re¬ 
ciprocal space shows layer discs construction preferably 
expandable in reciprocal space cylindrical coordinate 
(7i n terms of cylindrical harmonic ? 7 13:15 as in 
Eq.(|l7|. 


= 22 G n(nX\)G* n ,(KX\>)e i{n - n ' )4, (17) 


where G n (7ZX\) is a cylindrical harmonic ((\ = 27rA/c), 
as expressed in Eq. (18) associated with A -th layer line 
(permitted n values for various layer line number are tab¬ 
ulated in Fig. Q . 


Gn( n, Cx) = X i n fkJndnruY^-nM ( 18 ) 

k 

where J n is a n-th order Bessel function and the k 
summation runs over the real space cylindrical coordinate 
(r, 0, Z) of the atoms of the helical structure. 
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FIG. 4: Helix selection rule of TMV 493 helix. Layer line 
index lambda showed along Y in bold. We see that the al¬ 
lowed values of cylindrical harmonic n — n' — 49V, where 
V = 0, ±1, ±2, .. etc. 


B. Helix Selection Rule 

It has been shown 71314 that the helical repetition of 
49-unit protein subunit along the 3-turn introduces a 493 
helical symmetry that essentially lead to a helix selection 
rule for TMV in Fourier space Fig. 0 as follows 

A = 3n ± 49m (19) 

where A is layer line number and n is associated to the 
order of cylindrical harmonic expansion. 
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FIG. 5: Reconstructed diffraction volume for TMV 3 unit 
shows the layer disc pattern spaced by 27r/c. 

Due to helical arrangement of the protein subunits the 
layer line intensity of TMV expressed in terms of cylindri¬ 
cal harmonic Eq. (fl7|) . On the other hand, due to the ran¬ 
dom nature of the particle orientation subject to XFEL 
diffract and destroy experiment, the diffraction volume in 
correlation technique expressed conveniently in spherical 
geometry Eq. 0. According to helix selection rule 

n-ri = M (20) 

and the only allowed values of M for a 493 helix are 49V, 
where N = 0, ±49, ±98, etc. Here the above claim con¬ 
cludes that M=0 term generate the first order harmonics 
for the expansion of the diffraction volume of TMV, thus 
reducing Eq. © as 


the recovered diffraction volume by standard phasing 
technique. The very recent work of Donatelli et. al. 16 
introduces a multitiered iterative phasing (MTIP) algo¬ 
rithm based on a series of derived projection operators 
to iteratively modify the specified real space constraints 
and to match the data to external observations. This 
technique provides a framework that would allow the 
extension of density modification techniques developed 
for crystallographic structure determination. Though 
this model does not require symmetry consideration, 
however the quality of the structure determination in 
this method is dependent on the accuracy of the data 
set as well as the amount of known priori information 16 . 



FIG. 6: Reconstructed single unit seen to fit the PDB mesh 
structure within the outer capsid. 


q') = lLo(q)lLo(q') 


( 21 ) 


Eq. (21) allows the possibility of recovering the magni¬ 
tude of diagonal Fourier shell coefficients by taking the 
square root of the B^q, q) coefficients uncertain up to 
the signs of those discrete coefficients. More specifically 


ho (q,q) = ±\\/B L (q,q)\ 


( 22 ) 


Structural information hidden in B^q^q 1 ) obtained 
from a set of diffraction patterns of free electron laser 
or ultrabright synchrotron source results in unknown 
sign determination as in Eq. (22) in angular correlation 


method for certain biological samples having symmetry, 
as for example; the 493 helical symmetry of TMV. 
Double phasing technique might be useful to start 
with a initial good guess of the signs of the above 
coefficients Eq. (22) and then to iteratively modify them 


based on reciprocal space constraints. The additional 
challenge for this approach involves the convergence 
of the reconstruction of the correct diffraction volume 
as well as the recovery of the real space object form 


Another approach towards computing the scattering 
profiles from fluctuations X-ray scattering data based on 
three dimensional (3D) Zernike polynomial expansion 
model as introduced by Liu et. all 17 ^^, demonstrates 
the feasibility of ab initio model reconstruction of 
nanoparticles from experimental data based on valida¬ 
tion of several theoretical computation of representative 
molecules. Though this model does not assume any 
symmetry constraint, however it is computationally 
expensive 16 . 

On the other hand, known priori information may 
also be used to model the mathematical constraints in 
Fourier space thus reconstructing the diffraction volume 
with more accuracy and then to recover the structure 
using the standard phasing technique 19,20 . The model 
introduced in this work uses some priori information 
regarding the structure of the molecule such as the 
internal diameter of TMV. 
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III. PROPOSED MODEL 

The radial electronic charge density of TMV may be 
presented as an object constraint ID step model (model 
1) as following: 


p{r) = 


0 

Pconstant 


, r < Ri 
, R 1 < r < R 2 


Where R\ and R 2 are inner and outer radius of TMV. 
The inner core of TMV primarily composed of RNA (ri¬ 
bonucleic acid) surrounded by protein subunits. Since 
the electronic charge density distribution of RNA is rel¬ 
atively higher than that of outer protein coat, the radial 
charge density variation of TMV may be modeled with a 
slowly varying ID exponential term (£ —> 0) as follows 
(model 2): 


p{r) = 


0 

exp[-£r] 


,r<Ri 
, R 1 < r < R 2 


The scattered intensity is the squared modulus of com¬ 
plex amplitude; the Fourier Transform (FT) of the elec¬ 
tronic charge density p(f) of the real space object. 


m = \m \ 2 

= | J d 3 rp(f)e i ®' fJ | 2 

~ | yb 3 fexp[—(23) 


Here r denotes the real space coordinate and q corre¬ 
sponding reciprocal space variable. When the object is 
rotated in SO(3) the radial variation in 3D real and re¬ 
ciprocal space is conjugated via the angular momentum. 

Regarding the two rotational degrees of freedom in re¬ 
ciprocal space, the polar variation is intrinsically inserted 
in the reconstruction via L quantum number and the az¬ 
imuthal symmetry was imposed as described in Eq. (14). 


Assuming the radial variation in p(r) is the prime focus 
of the model, Eq. (23) may be treated as a ID integral. 
With a limit R 2 —> oc; the ID Fourier transform of the 
slowly varying decaying exponential in p can be written 
as: 


pOO 

T(k) = / 

JR! 


dr exp[—£r] exp [—ikr\ 


(24) 


With a limit Ri —> 0, the Fourier transform integral 
can be evaluated in the following way: 


POO 

T(k) = / dr exp[—£r] exp[— ikr\ 

JR^ 


’Ri 

1 


R 1 —^0 


€ + ik 

£ 


ik 


£ 2 + k 2 £ 2 + k 2 


(25) 


With the limit £ —> 0 the real part of the Fourier 
transform Eq.( |25|) te nd to a S function in reciprocal space 
at k = Eq~|26| . 


c 


£ 2 + k 2 


= jp) 


And the imaginary part varies as 1 /k Eq. (27) 


k 1 

£ 2 + fc 2 ^ k 


(26) 


(27) 


The existence of the £-like behavior in reciprocal space 
of the molecule (note that the contribution from the 
imaginary part is unimportant) due to real space charge 
density step boundary provide very important clue re¬ 
garding the positivity of the diagonal shell correlation 
Eq. ([22]) of the resolution shell corresponding to q = 
2 tt/{NnsRi) where N^s is the Nyquist oversampling 
rate^D (for TMV R\ ~ 19A). For two times intensity 
oversampled data (N^s = 2), Eq.(22) can be written as 


lLo(q,q = -^r) = 


+\/B L (q,q= ) (28) 


Eq. (28) would be the decisive resolution shell whose 
signs propagates to all the corresponding shells. The 
justification of the above claim lies in the fact that the 
method described here is not limited to molecules having 
helical geometry; it has been verified for other geometry 
as well. 

To be more specific; Ilo(q, q = n/Ri) is a decisive shell 
for the determination of other resolution shells via the 


quadratic cross correlation function F>z,(g, q') of Eq. (21) 
for reconstructing the three dimensional (3D) Fourier 
map of the molecule. 


IV. RESULTS AND VALIDATION 


As an initial validation of the above method, roughly 
a set of one thousand simulated diffraction patterns of 
TMV was generated with an optimized code for simulat¬ 
ing diffraction patterns which was the most computation¬ 
ally intensive part of the project. For the simulation the 
generated diffraction patterns successfully account the 
contribution of the many repeating units from a single 
unit calculation via a shape transform factor. For some 
orientations the simulated diffraction patterns show the 
layer line features Fig. © separated by 2tt/c where c is 
the length of single repeat unit of TMV. 

From the set of simulated diffraction patterns B^q^ q') 
was recovered using the matrix inversion technique for 
non square matrices followed by a calculation of intensity 
cross-correlation using Eq.Q. Once a correct determina¬ 
tion of the decisive correlation shell was obtained via the 
proposed model, determination of the remaining shells 
were obtained via Eq. (21). Upon successful recovery of 
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reconstruction closely fits the helical repetition as well 
as the central hole of TMV structure. The three unit 
reconstruction recovers the helical grooves as well as the 
central hole of TMV as shown in Fig. 0. 


V. SUMMARY 


In this work we have demonstrated the applicability 
of a technique that some known real space object con¬ 
straints may be used to appropriately determine single 
resolution reference shell and the subsequent determina¬ 
tion of other Fourier shells can be obtained via Fourier 


shell cross-correlation Eq. (21). 


We successfully applied the method for reconstructing 
single (and three) repeating unit of TMV from simulated 
XFEL diffraction patterns. Further improvement of the 
model might be useful for reconstructing object such as 
DNA double helix 4 or objects having structures deviated 
partially from the above-mentioned case, as for example 
the coiled RNA core of TMV which protrudes out of the 
TMV inner capsid. For small deviation of the structure, 
we may consider the small variation in ^(g, q') of Eq.([9| 
to obtain 


FIG. 7: Three-unit reconstruction of TMV. Reconstruction 
correctly recovers the helical turns as well as the central hole 
of TMV. 


the shell correlation the reconstruction of the three di¬ 
mensional (3D) diffraction volume was obtained by Eq. 
(14). Fig. (pi) shows the layer disc reconstruction of the 
diffraction volume where the discs are separated by the 
characteristic reciprocal spacing 2i r/c. 

The real space reconstruction of the electron density 
of one unit and three unit of TMV was obtained us¬ 
ing standard phasing algorithm based on charge flipping 
method 19 2 ^. The resolutions of both the reconstructions 
are ~ 13A 

To further assist the quality of the reconstruction, the 
reconstruction was superposed to the PDB reconstruc¬ 
tion of TMV single repeating unit. Fig. 0 shows the 


6B L (q,q') = + Ilm^WImW)} 

M 

(29) 

For small deviation in electron density from the original 
calculated model one might calculate dB^q^q') to 
obtain directly the deviation in electron density using a 
similar method as introduced by Pande et. al. 6 . 


Acknowledgement 

This work is partly supported by the National Sci¬ 
ence Foundation Science and Technology Center (STC- 
1231306). I also acknowledge the UWM High Perfor¬ 
mance Computing Center (HPC) for the use of the Avi 
cluster. 


* e-mail address: miraj.uddin@uwc.edu 

1 Z. Kam, Macromolecules 10, 927 (1977). 

2 V. Elser, New Journal of Physics 13, 123014 (2011). 

3 K. Namba and G. Stubbs, Science 231, 1401 (1986). 

4 J. D. Watson and F. H. C. Crick, Nature (London) 171, 
737 (1953). 

5 D. K. Saldin, V. L. Shneerson, R. Fung and A. Ourmazd, 
J. Phys.: Condens. Matter 21, 134014 (2009). 

6 K. Pande, P. Schwander, M. Schmidt and D. K. Saldin, 
Phil. Trans. R. Soc. B 369, 20130332 (2014). 

7 H. -C. Poon, P. Schwander, M. Uddin and D. K. Saldin, 
Phys. Rev. Letter 110, 265505 (2013). 

8 D. K. Saldin, H. -C. Poon, P. Schwander, M. Uddin and 


M. Schmidt, Opt. Express 19, 17318 (2011). 

9 R. A. Kirian, J. Phys. B: At. Mol. Opt. Phys. 46 , 223001 

( 2012 ). 

10 A. King, Phil. Trans. R. Soc. B 354, 531 (1999). 

11 R. P. Millane, Acta Cryst. A47, 440 (1991). 

12 Z. Kam, J. Theor. Biol 82, 15 (1980). 

13 W. Cochran, F.H.C. Crick and V. Vand, Acta Cryst 5, 
581 (1952). 

14 A. King, F. H. C. Crick and H. W. Wyckoff, Acta Cryst. 
11, 199 (1958). 

15 D. K. Saldin, V L Shneerson, D. Starodub and J. C. 
H. Spence, Acta Cryst A 66, 32 (2010). 

16 J. F. Donatelli, P. H. Zwart and J. A. Sethian, Proceedings 







7 


of the National Academy of Sciences 112, 10286 (2015). 

17 H. Liu, B. K. Boon, A. J. E. M Janssen and P. H. Zwart, 
Acta Cryst A68, 561 (2012). 

18 H. Liu, B. K. Poon, D. K. Saldin, J. C. H. Spence and P. 
H. Zwart, Acta Cryst A 69, 365 (2013). 


19 G. Oszlanyi and A. Suto, Acta Cryst. A60, 134 (2004)- 

20 G. Oszlanyi and A. Suto, Acta Cryst. A61, 147 (2005). 

21 M. H. M. H. Hayes, IFFF Trans Acoust Speech Signal Pro¬ 
cess 30(2), 140 (1982). 



